Faster Adaptive Set Intersections for Text Searching

نویسندگان

  • Jérémy Barbay
  • Alejandro López-Ortiz
  • Tyler Lu
چکیده

The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which improves over those proposed by Demaine, Munro and López-Ortiz [SODA 2000/ALENEX 2001], by using a variant of interpolation search. More specifically, our contributions are threefold. First, we corroborate and complete the practical study from Demaine et al. on comparison based intersection algorithms. Second, we show that in practice replacing binary search and galloping (one-sided binary) search [4] by interpolation search improves the performance of each main intersection algorithms. Third, we introduce and test variants of interpolation search: this results in an even better intersection algorithm. Topics. Evaluation of Algorithms for Realistic Environments, Implementation, Testing, Evaluation and Fine-tuning of Algorithms, Information Retrieval.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Chinese Text Compression Scheme Combining Dictionary Coding and Adaptive Alphabet-Character Grouping

In this paper, a new scheme is proposed for Chinese text compression. The factors, compression rate and decompression speed, are specially considered in order to help such applications as full-text searching. Actually, our scheme is based on the LZ77 scheme. The modifications made include alphabet-augmenting to obtain better compression rate, and adaptive-grouping to have faster processing spee...

متن کامل

Prediction of RO Membrane Performances by Use of Adaptive Network-Based Fuzzy Interference Systems

This study aims to develop an Adaptive Network-based Fuzzy Inference System technique (ANFIS) and using the parameters of a complex mathematical model in the RO membrane performances. The ANFIS was constructed by using a subtractive clustering method to generate initial fuzzy inference systems. The model trained by 70% of the data set and then its validity is examined by remained 30% data set. ...

متن کامل

A Family of Variable Step-Size Normalized Subband Adaptive Filter Algorithms Using Statistics of System Impulse Response

This paper presents a new variable step-size normalized subband adaptive filter (VSS-NSAF) algorithm. The proposed algorithm uses the prior knowledge of the system impulse response statistics and the optimal step-size vector is obtained by minimizing the mean-square deviation(MSD). In comparison with NSAF, the VSS-NSAF algorithm has faster convergence speed and lower MSD. To reduce the computa...

متن کامل

Tries for combined text and spatial data range search

We use tries to represent combined text and spatial data, and present a range search algorithm for reporting all 2-d points and rectangles from a set of size intersecting a query rectangle. Data and queries can include text. Our -d+ tries are evaluated experimentally (for up to 300,000) using uniform distributed random spatial data and randomly selected strings from a set of place names. For ra...

متن کامل

Enhancing GNU grep

The UNIX grep utility searches the input files selecting lines matching one or more patterns. Searching for patterns in text is an important operation in a number of domains, including program comprehension and software maintenance, structured text databases, indexing file systems, and searching natural language texts. Such a wide range of uses inspired the development of variations of the orig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006